========================================================

Introduction: PROSPER LOAN DATA SET

It is financial dataset.

Brief background about Prosper company:

Prosper was founded in 2005 as the first peer-to-peer lending marketplace in the United States. Since then, Prosper has facilitated more than $14 billion in loans to more than 880,000 people.

Through Prosper, people can invest in each other in a way that is financially and socially rewarding. Borrowers apply online for a fixed-rate, fixed-term loan between $2,000 and $40,000. Individuals and institutions can invest in the loans and earn attractive returns. Prosper handles all loan servicing on behalf of the matched borrowers and investors.

Prosper Marketplace is backed by leading investors including Sequoia Capital, Francisco Partners, Institutional Venture Partners, and Credit Suisse NEXT Fund.

Source-https://www.prosper.com/

Loading dataset-

library(ggplot2)
loan <- read.csv('/home/reshu/Desktop/prosperLoanData.csv')

#Reading and understanding data
str(loan)
## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$100,000+",..: 4 5 7 4 2 2 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
names(loan)
##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"
summary(loan)
##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 
head(loan)
##                ListingKey ListingNumber           ListingCreationDate
## 1 1021339766868145413AB3B        193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1       1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A         81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A        658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2        909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D       1074836 2013-12-14 08:26:37.093000000
##   CreditGrade Term LoanStatus          ClosedDate BorrowerAPR BorrowerRate
## 1           C   36  Completed 2009-08-14 00:00:00     0.16516       0.1580
## 2               36    Current                         0.12016       0.0920
## 3          HR   36  Completed 2009-12-17 00:00:00     0.28269       0.2750
## 4               36    Current                         0.12528       0.0974
## 5               36    Current                         0.24614       0.2085
## 6               60    Current                         0.15425       0.1314
##   LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1      0.1380                      NA            NA              NA
## 2      0.0820                 0.07960        0.0249         0.05470
## 3      0.2400                      NA            NA              NA
## 4      0.0874                 0.08490        0.0249         0.06000
## 5      0.1985                 0.18316        0.0925         0.09066
## 6      0.1214                 0.11567        0.0449         0.07077
##   ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1                      NA                                 NA
## 2                       6                     A            7
## 3                      NA                                 NA
## 4                       6                     A            9
## 5                       3                     D            4
## 6                       5                     B           10
##   ListingCategory..numeric. BorrowerState    Occupation EmploymentStatus
## 1                         0            CO         Other    Self-employed
## 2                         2            CO  Professional         Employed
## 3                         0            GA         Other    Not available
## 4                        16            GA Skilled Labor         Employed
## 5                         2            MN     Executive         Employed
## 6                         1            NM  Professional         Employed
##   EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1                        2                True             True
## 2                       44               False            False
## 3                       NA               False             True
## 4                      113                True            False
## 5                       44                True            False
## 6                       82                True            False
##                  GroupKey              DateCreditPulled
## 1                         2007-08-26 18:41:46.780000000
## 2                                   2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4                                   2012-10-22 11:02:32
## 5                                   2013-09-14 18:38:44
## 6                                   2013-12-14 08:26:40
##   CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1                   640                   659     2001-10-11 00:00:00
## 2                   680                   699     1996-03-18 00:00:00
## 3                   480                   499     2002-07-27 00:00:00
## 4                   800                   819     1983-02-28 00:00:00
## 5                   680                   699     2004-02-20 00:00:00
## 6                   740                   759     1973-03-01 00:00:00
##   CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1                  5               4                         12
## 2                 14              14                         29
## 3                 NA              NA                          3
## 4                  5               5                         29
## 5                 19              19                         49
## 6                 21              17                         49
##   OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1                     1                          24                    3
## 2                    13                         389                    3
## 3                     0                           0                    0
## 4                     7                         115                    0
## 5                     6                         220                    1
## 6                    13                        1410                    0
##   TotalInquiries CurrentDelinquencies AmountDelinquent
## 1              3                    2              472
## 2              5                    0                0
## 3              1                    1               NA
## 4              1                    4            10056
## 5              9                    0                0
## 6              2                    0                0
##   DelinquenciesLast7Years PublicRecordsLast10Years
## 1                       4                        0
## 2                       0                        1
## 3                       0                        0
## 4                      14                        0
## 5                       0                        0
## 6                       0                        0
##   PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1                         0                      0                0.00
## 2                         0                   3989                0.21
## 3                        NA                     NA                  NA
## 4                         0                   1444                0.04
## 5                         0                   6193                0.81
## 6                         0                  62999                0.39
##   AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1                    1500          11                               0.81
## 2                   10266          29                               1.00
## 3                      NA          NA                                 NA
## 4                   30754          26                               0.76
## 5                     695          39                               0.95
## 6                   86509          47                               1.00
##   TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 1                       0              0.17 $25,000-49,999
## 2                       2              0.18 $50,000-74,999
## 3                      NA              0.06  Not displayed
## 4                       0              0.15 $25,000-49,999
## 5                       2              0.26      $100,000+
## 6                       0              0.36      $100,000+
##   IncomeVerifiable StatedMonthlyIncome                 LoanKey
## 1             True            3083.333 E33A3400205839220442E84
## 2             True            6125.000 9E3B37071505919926B1D82
## 3             True            2083.333 6954337960046817851BCB2
## 4             True            2875.000 A0393664465886295619C51
## 5             True            9583.333 A180369302188889200689E
## 6             True            8333.333 C3D63702273952547E79520
##   TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1                NA                         NA                    NA
## 2                NA                         NA                    NA
## 3                NA                         NA                    NA
## 4                NA                         NA                    NA
## 5                 1                         11                    11
## 6                NA                         NA                    NA
##   ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1                                  NA                              NA
## 2                                  NA                              NA
## 3                                  NA                              NA
## 4                                  NA                              NA
## 5                                   0                               0
## 6                                  NA                              NA
##   ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1                       NA                          NA
## 2                       NA                          NA
## 3                       NA                          NA
## 4                       NA                          NA
## 5                    11000                      9947.9
## 6                       NA                          NA
##   ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1                          NA                         0
## 2                          NA                         0
## 3                          NA                         0
## 4                          NA                         0
## 5                          NA                         0
## 6                          NA                         0
##   LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1                            NA                         78      19141
## 2                            NA                          0     134815
## 3                            NA                         86       6466
## 4                            NA                         16      77296
## 5                            NA                          6     102670
## 6                            NA                          3     123257
##   LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1               9425 2007-09-12 00:00:00                Q3 2007
## 2              10000 2014-03-03 00:00:00                Q1 2014
## 3               3001 2007-01-17 00:00:00                Q1 2007
## 4              10000 2012-11-01 00:00:00                Q4 2012
## 5              15000 2013-09-20 00:00:00                Q3 2013
## 6              15000 2013-12-24 00:00:00                Q4 2013
##                 MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA             330.43            11396.14
## 2 1D13370546739025387B2F4             318.93                0.00
## 3 5F7033715035555618FA612             123.32             4186.63
## 4 9ADE356069835475068C6D2             321.45             5143.20
## 5 36CE356043264555721F06C             563.97             2819.85
## 6 874A3701157341738DE458F             342.37              679.34
##   LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1                      9425.00            1971.14        -133.18
## 2                         0.00               0.00           0.00
## 3                      3001.00            1185.63         -24.20
## 4                      4091.09            1052.11        -108.01
## 5                      1563.22            1256.63         -60.27
## 6                       351.89             327.45         -25.33
##   LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1                 0                     0                   0
## 2                 0                     0                   0
## 3                 0                     0                   0
## 4                 0                     0                   0
## 5                 0                     0                   0
## 6                 0                     0                   0
##   LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1                               0             1               0
## 2                               0             1               0
## 3                               0             1               0
## 4                               0             1               0
## 5                               0             1               0
## 6                               0             1               0
##   InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1                          0                           0       258
## 2                          0                           0         1
## 3                          0                           0        41
## 4                          0                           0       158
## 5                          0                           0        20
## 6                          0                           0         1
dim(loan)
## [1] 113937     81
#We observe a lot of NAs. We must avoid analysing the columns with NAs

Obervation:

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

UNIVARIATE PLOTS -

BorrowerAPR:

A number of factors—such as term, type of interest rate etc.—can affect the cost of credit and make it hard to compare multiple loans. The APR makes comparison shopping easier. It’s a common unit of measurement for loans.

The APR figures in not just your interest rate, but also some fees associated with your loan over its lifetime. At Prosper, this means the closing fee charged when you first borrow the money. This closing fee is paid out of the loan proceeds when the loan originates.

Distribution of BorrowAPR-

summary(loan$BorrowerAPR)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229      25
ggplot(aes(x = BorrowerAPR), data = loan) +
  geom_histogram(fill = '#3CB371') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 25 rows containing non-finite values (stat_bin).

The distibution is roughly normal, except of peaks on right side.We observed few NA’s in the above statistics’ summary, it is better to filter out those 25 rows consisting of NAs.

CreditGrade:

The Credit rating that was assigned at the time the listing went live. Applicable for listings pre-2009 period and will only be populated for those listings.

ggplot(aes(x = CreditGrade), data = loan) +
  geom_bar(color = 'black', fill = '#3CB371') 

#removing blank values

ggplot(aes(x = CreditGrade), data = loan) +
  geom_bar(color = 'black', fill = '#3CB371') +
  scale_x_discrete(limits = c('A', 'AA', 'B', 'C', 'D','E','HR', 'NC'))
## Warning: Removed 84984 rows containing non-finite values (stat_count).

Observation:

We observe that count of NC (no credit) is very small.It implies that only a very few borrowers were not graded at the time of the listing.

Let’s limit the axis further by removing NC

#removing blank values

ggplot(aes(x = CreditGrade), data = loan) +
  geom_bar(color = 'black', fill = '#3CB371') +
  scale_x_discrete(limits = c('A', 'AA', 'B', 'C', 'D','E','HR'))
## Warning: Removed 85125 rows containing non-finite values (stat_count).

ProsperScore:

A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score. Applicable for loans originated after July 2009.

ggplot(aes(x = ProsperScore), data = loan) +
  geom_histogram(color = 'black', fill = '#228B22') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 29084 rows containing non-finite values (stat_bin).

This histogram seems full of spikes due to NAs Let’s plot bar-graph,that would be more suitable for non-continous data-column.

ggplot(aes(x = ProsperScore), data = loan) +
  geom_bar(color = 'black', fill = '#228B22') 
## Warning: Removed 29084 rows containing non-finite values (stat_count).

summary(loan$ProsperScore)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    4.00    6.00    5.95    8.00   11.00   29084

LoanStatus:The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue.

Since it is a categorical variable, we observe its bar graph.

Distribution of LoanStatus

ggplot(aes(x = LoanStatus), data = loan) +
  geom_bar(color = 'black', fill = '#3CB371') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This graph shows that most of the Loan Status are in ‘Current’ state, i.e. currently on-going loan processes and after that mostly are in “Completed” status.

StatedMonthlyIncome:

The monthly income the borrower stated at the time the listing was created Since it being a numeric variable, we investigate the histogram.

ggplot(aes(x = StatedMonthlyIncome), data = loan) +
  geom_histogram(fill = '#7CFC00') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#since it contains lot of outliers, let's limit the x-axis

ggplot(aes(x = StatedMonthlyIncome), data = loan) +
  geom_histogram(fill = '#3CB371') +
    scale_x_continuous(limits = c(0,100000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 17 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

Loan Performance at State Level:

Being a numeric variable, we investigate the histogram.

The distribution is right skewed, as observed in the histogram. [Mean > Median]

Distribution of Loan Amount

LoanOriginalAmount

The origination amount of the loan.

ggplot(aes(x = LoanOriginalAmount), data = loan) +
  geom_histogram(fill = '#3CB371') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This distribution is right skewed, which is obvious from the histogram, except couple of peaks. Moreover, we also notice that it is quite rare for borrowers to ask for huge amount of loans through prosper.

EmploymentStatus:

The employment status of the borrower at the time they posted the listing. Since,the data is categorical, use histogram.

ggplot(aes(x = EmploymentStatus), data = loan) +
  geom_bar(color = 'black', fill = '#3CB371') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

As can be observed most of the loan takers are Employed. The first category with no label is NA actually.

Distribution of Estimated Loss

EstimatedLoss:

Estimated loss is the estimated principal loss on charge-offs. Applicable for loans originated after July 2009.

ggplot(aes(x = EstimatedLoss), data = loan) +
  geom_histogram(color = 'black', fill = '#3CB371') 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 29084 rows containing non-finite values (stat_bin).

ggplot(aes(x = EstimatedLoss), data = loan) +
  geom_histogram(color = 'black', fill = '#3CB371') +
  scale_x_continuous(limits = c(0.0, 0.2))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 29530 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

BIVARIATE PLOTS -

CreditGrade -The Credit rating that was assigned at the time the listing went live. BorrowerAPR - The Borrower’s Annual Percentage Rate (APR) for the loan.

#Jitter
ggplot(aes(x = CreditGrade, y = BorrowerAPR), data = loan) +
  geom_jitter(alpha = 1/20, color = 'darkgreen') 
## Warning: Removed 25 rows containing missing values (geom_point).

Since the distribution is categorical,let’s plot boxplot as that would be more suitable.

#Boxplots
#Removing NAs category

ggplot(aes(x = CreditGrade, y = BorrowerAPR), data = loan) +
  geom_boxplot(alpha = 1/20, color = 'darkgreen') +
  scale_x_discrete(limits = c('A', 'AA', 'B', 'C', 'D','E','HR', 'NC'))
## Warning: Removed 84984 rows containing missing values (stat_boxplot).
## Warning: Removed 25 rows containing non-finite values (stat_boxplot).

Highest median is for category “HR” Credit Garde , the lowest median is for category “AA” Credit Grade.

InquiriesLast6Months:

Number of inquiries in the past six months at the time the credit profile was pulled.

#Jitter

ggplot(aes(x = InquiriesLast6Months, y = BorrowerAPR), data = loan) +
  geom_jitter(alpha= 1/15, color = 'limegreen', size =4) +
  xlim(0, 20)
## Warning: Removed 25887 rows containing missing values (geom_point).

Using the scatterplot, we observe absence of linear correlation between BorrowerAPR and InquiriesMadeWithinLast6Months.

CreditScoreRangeLower - The lower value representing the range of the borrower’s credit score as provided by a consumer credit rating agency.

CreditScoreRangeUpper - The upper value representing the range of the borrower’s credit score as provided by a consumer credit rating agency.

#Jitter

ggplot(aes(x =CreditScoreRangeUpper, y = BorrowerAPR), data = loan) +
  geom_jitter(alpha= 1/20,color = 'darkgreen') 
## Warning: Removed 591 rows containing missing values (geom_point).

#Jitter

ggplot(aes(x =CreditScoreRangeLower, y = BorrowerAPR), data = loan) +
  geom_jitter(alpha= 1/20, color = 'darkgreen')
## Warning: Removed 591 rows containing missing values (geom_point).

Both plots show similar trends. There is no linear correlation between CreditScoreRange and BorrowerAPR.

CurrentCreditLines -Number of current credit lines at the time the credit profile was pulled.

OpenCreditLines - Number of open credit lines at the time the credit profile was pulled.

#Jitter

ggplot(aes(x = CurrentCreditLines, y = OpenCreditLines), data = loan) +
  geom_jitter(alpha= 1/10, color = 'limegreen')
## Warning: Removed 7604 rows containing missing values (geom_point).

Using the scatterplot,we can obsrve that there is linear corrlelation between CurrentCreditLines and OpenCreditLines.

LoanOriginalAmount : The origination amount of the loan. Term : The length of the loan expressed in months.

#Boxplots
#Removing NAs category

ggplot(aes(x = IncomeRange, y = DebtToIncomeRatio), data = loan) +
  geom_boxplot(alpha = 1/20, color = 'darkgreen') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 8554 rows containing non-finite values (stat_boxplot).

As can be observed, unemployed people’s income range is having very long range of spread for debt to income ratio,obviously because debt>>income for them.

#Boxplots
#Removing NAs category

ggplot(aes(x = IncomeRange, y = LoanOriginalAmount), data = loan) +
  geom_boxplot(alpha = 1/20) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

As expected, income range $100,000+ people take great amount of loan.

#Boxplots


ggplot(aes(x = IncomeRange, y = EstimatedLoss), data = loan) +
  geom_boxplot(alpha = 1/20) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 29084 rows containing non-finite values (stat_boxplot).

As expected, the estimated loss for not employed people is highest.

ggplot(aes(x = IncomeRange, y = CreditGrade), data = loan) +
  geom_jitter(alpha = 1/20, color = 'darkgreen') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_discrete(limits = c('A', 'AA', 'B', 'C', 'D','E','HR', 'NC'))
## Warning: Removed 84984 rows containing missing values (geom_point).

As expected and also observed, Not employed people has no distribution in Credit-Grade at all as they might not be satisfying the inhibited criteria.

Term- The length of the loan expressed in months

ggplot(loan, aes(Term, LoanOriginalAmount, group = Term)) +
  geom_boxplot() +
  scale_x_continuous(breaks = c(0,12,36,60)) +
  theme_minimal()

5 year(60 months) borrowers seem to be more credit-worthy on average.

CurrentCreditLines- Number of current credit lines at the time the credit profile was pulled.

ggplot(aes(x = IncomeRange, y = CurrentCreditLines), data =  loan) +
  geom_boxplot(alpha = 1/20) +
  theme(axis.text.x = element_text(angle = 55, hjust = 1))
## Warning: Removed 7604 rows containing non-finite values (stat_boxplot).

As observed plus it can be analogically deduced too that unemployed people have low credit lines whereas the people having highest income range have high credit lines.

ggplot(aes(x = IncomeRange, y = OpenCreditLines), data =  loan) +
  geom_boxplot(alpha = 1/20) +
  theme(axis.text.x = element_text(angle = 55, hjust = 1))
## Warning: Removed 7604 rows containing non-finite values (stat_boxplot).

Making a new column “year” from “DateCreditPulled”

loan$year <- format(as.Date(loan$DateCreditPulled), "%Y") 
#zooming out


ggplot(aes(x = loan$year, y = LoanOriginalAmount ), data = loan) +
  geom_boxplot(alpha = 1/20) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_discrete(limits = c('2008', '2009', '2010', '2011', '2012','2013','2014'))
## Warning: Removed 17888 rows containing missing values (stat_boxplot).

Amount of loan taken is highest for year 2013 on an average. Most of the people have taken loan in 2013.2nd highest is 2014 in that terms.

ggplot(aes(x = loan$year, y = BorrowerAPR ), data = loan) +
  geom_boxplot(alpha = 1/20) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 
## Warning: Removed 25 rows containing non-finite values (stat_boxplot).

The yearly distribution of Borrower APR is quite unusual and there is no as such yearly pattern of regular increment or decrement observed here.

loan$month <- format(as.Date(loan$DateCreditPulled), "%m") 





ggplot(aes(x = loan$month, y = LoanOriginalAmount  ), data = loan) +
  geom_boxplot(alpha = 1/20) +
  theme(axis.text.x = element_text(angle = 40, hjust = 1)) 

If we see the Amount of loan taken month wisely,most of the amount are attributed to 01-January and 12-December.

MULTIVARIATE PLOTS

ggplot(aes(x = DebtToIncomeRatio, y = BorrowerRate), data = loan) +
  geom_point(aes(color = CreditGrade),
             position = position_jitter(h = 0), alpha = 0.6) +
  scale_colour_brewer(palette = "BuGn")  + theme_dark() +
  xlim(0, 7)
## Warning: Removed 8857 rows containing missing values (geom_point).

ggplot(aes(x = DebtToIncomeRatio, y = BorrowerRate), data = loan) +
  geom_point(aes(color = CreditGrade),
             position = position_jitter(h = 0), alpha = 0.6) +
  scale_colour_brewer(palette = "BuGn")  + theme_dark() +
   geom_smooth(aes(color = CreditGrade), se = F) +
  xlim(0,7)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 8847 rows containing non-finite values (stat_smooth).
## Warning: Removed 8858 rows containing missing values (geom_point).

This is a great plot with a lot of information. Here we have a scatter plot of borrower’s APR and the debt to income ratio of the borrower, with the colors describing the Credit Grade given to the particular loan.The very first thing to be noticed and found interesting is that ‘A’ category loans seem to have a lower APRs and a smaller range of debt-to-income ratios, both of which indicate less risk. Also, there is this unusual horizontal line in the ‘E’ category that extends past 1 till 7.

ggplot(aes(y = EstimatedLoss, x = DebtToIncomeRatio), data = loan) +
  geom_point(aes(color = ProsperScore), alpha = 0.6) +
  xlim(0, 3) 
## Warning: Removed 36491 rows containing missing values (geom_point).

Here we have a scatter plot of Estimated Loss( It is the estimated principal loss on charge-offs) and the debt to income ratio of the borrower, with the colors describing the Prosper Score(A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score. ) given to the particular loan.

If Estimated Loss is low and DebtToIncomeRatio are low , the custom risk is lowest.

ggplot(aes(y = EstimatedReturn, x = DebtToIncomeRatio), data = loan) +
  geom_point(aes(color = ProsperScore), alpha = 0.6) 
## Warning: Removed 36380 rows containing missing values (geom_point).

Here we have a scatter plot of Estimated Return(It is the difference between the Estimated Effective Yield and the Estimated Loss Rate) and the debt to income ratio of the borrower, with the colors describing the Prosper Score.

Estimated Effective Yield -Effective yield is equal to the borrower interest rate (i) minus the servicing fee rate, (ii) minus estimated uncollected interest on charge-offs, (iii) plus estimated collected late fees

Estimated Loss Rate - The estimated principal loss on charge-offs

Prosper Score is better for low DebtToIncome ratio given Estimted Return be higher.

ggplot(aes(y = OnTimeProsperPayments, x = DebtToIncomeRatio), data = loan) +
  geom_point(aes(color = ProsperScore), alpha = 0.6) +
  xlim(0, 2)
## Warning: Removed 94066 rows containing missing values (geom_point).

OnTimeProsperPayments - Number of on time payments the borrower had made on Prosper loans at the time they created this listing. This value will be null if the borrower has no prior loans.

Prosper Score is high for low debt ratio given that borrower has no prior loans or less no. of loans undertaken at listing time.

Final 3 Plots and Summary

PLOT 1

library(gridExtra)
p1 <- 
ggplot(aes(x = loan$year, y = BorrowerAPR ), data = loan) +
  geom_boxplot(alpha = 1/20, color = 'limegreen') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
p2 <- ggplot(aes(x = loan$year, y = LoanOriginalAmount ), data = loan) +
  geom_boxplot(alpha = 1/20, color = 'orange') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
p3 <- ggplot(aes(x = loan$year, y = BorrowerRate ), data = loan) +
  geom_boxplot(alpha = 1/20, color = 'blue') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))


grid.arrange(p1, p2, p3, ncol = 1)
## Warning: Removed 25 rows containing non-finite values (stat_boxplot).

Description :

For different years, Borrower Rate and Loan Original Amounts are different and thus BorrowerAPR(The Borrower’s Annual Percentage Rate (APR) for the loan.) differ too.

BorrowerAPR box plots are quite identical to BorrowerRate on annual basis as can be viewed in plot in green and blue.

Loan in Original amount , the box plots in orange has different trend from other two plots annualy.

PLOT 2

library(gridExtra)
p1 <- ggplot(aes(y = EstimatedLoss, x = DebtToIncomeRatio), data = loan) +
  geom_point(aes(color = ProsperScore), alpha = 0.6) 
  

p2 <- ggplot(aes(y = EstimatedReturn, x = DebtToIncomeRatio), data = loan) +
  geom_point(aes(color = ProsperScore), alpha = 0.6) 
 

p3 <- ggplot(aes(y = OnTimeProsperPayments, x = DebtToIncomeRatio), data = loan) +
  geom_point(aes(color = ProsperScore), alpha = 0.6) 
 
grid.arrange(p1, p2, p3, ncol = 1) 
## Warning: Removed 36380 rows containing missing values (geom_point).

## Warning: Removed 36380 rows containing missing values (geom_point).
## Warning: Removed 94022 rows containing missing values (geom_point).

Description :

When the debt to income ratio are low plus Estimated loss, one time prosperity payment and Estimated Return are low , more light blue dots are visible which denotes higher Prosper Score.

PLOT 3

ggplot(data=subset(loan, loan$CreditScoreRangeLower > 660),
  aes(x=BorrowerAPR, y=LoanOriginalAmount, color=CreditScoreRangeUpper)) +
  geom_point(alpha=0.09, position='jitter') +
  scale_colour_gradient(low="yellow", high="brown") +
  ggtitle("Loan Amount by Credit Score and Interest Rate") +
  facet_wrap(~year) +
  theme_bw()

Description :

The borrowers with high credit scores are in brown color region on the left side. They generally have lower interest rates and larger loan amounts. In 2013 and 2014, much more yellow dots are visible there(credit score ~700) borrowers.

REFLECTION:

The Prosper Loan data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

I explored the data set and tried best to find important relations between various variables.I converted Date into Year and stored in seperate column and used that column to plot the yearly/annual trend. Also,explored about various themes and ggplotting techniques to plot quite eye appeasing and easily understandable plots.The main difficuly I had with the data mainly from understanding the variables and then selecting the appropriate ones to analyze obbiously there are a lot of variables to explore. Many variables are yet unexplored and I hope to explore them in near future.

Some limitation to the dataset:Too many NAs value, Too many columns with NAs value,not easy to interpret.

High probability of outliers effecting the distribution as the length of dataset is too long.